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The maximum clique problem is a well known NP-Hard problem with applications in data mining, net- 
work analysis, informatics, and many other areas. Although there exist several algorithms with acceptable 
runtimes for certain classes of graphs, many of them are infeasible for massive graphs. We present a new 
exact algorithm that employs novel pruning techniques to very quickly find maximum cliques in large sparse 
graphs. Extensive experiments on several types of synthetic and real- world graphs show that our new algo- 
rithm is up to several orders of magnitude faster than existing algorithms for most instances. We also present 
a heuristic variant that runs orders of magnitude faster than the exact algorithm, while providing optimal or 
near-optimal solutions. 
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1. Introduction 

A clique in an undirected graph is a subset of vertices in which every two vertices are 
adjacent to each other. The maximum clique problem seeks to find a clique of the largest 
possible size in a given graph. 

The maximum clique problem, and the related maximal clique and clique enumera- 
tion problems, find applications in diverse areas. Some examples include data mining 
[13, 35, 38], information retrieval [2], social networks [15], bioinformatics [24], com- 
puter vision [18], coding [8], and economics [5]. An example of its application can be 
given using data mining or information retrieval, where one needs to retrieve data that 
are considered similar based on some given metric. A graph is constructed with vertices 
corresponding to data items and edges connecting similar items. Finding a clique in such 
a graph gives a cluster of similar data. Such problems also arise in various other areas 
including identification and classification of new diseases based on symptom correlation 
[7], pattern recognition [30], and bioinformatics [24]. More recently, the maximum clique 
problem has seen important applications in social network analysis, primarily in commu- 
nity detection [15, 28, 32]. More examples of application areas for clique problems can 
be found in [17, 29]. 
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The maximum clique problem is NP-Hard [16]. Most exact algorithms for solving it em- 
ploy some form of branch-and-bound approach. While branching systematically searches 
for all candidate solutions, bounding (also known as pruning) discards fruitless candi- 
dates based on a previously computed bound. An early example of a simple and effective 
branch-and-bound algorithm for the maximum clique problem is one by Carraghan and 
Pardalos [9]. More recently, Ostergard [27] introduced an improved algorithm and demon- 
strated its relative advantages via computational experiments. Tomita and Seki [34], and 
later, Konc and Janezic [20] use upper bounds computed using vertex coloring to enhance 
the branch-and-bound approach. Other examples of branch-and-bound algorithms for the 
clique problem include [3, 6, 33]. Prosser [31] has in a recent work compared various 
exact algorithms for the maximum clique problem. 

An attractive feature of the algorithms of [9] and [27] is their simplicity in terms of ease 
of implementation. However, their runtimes could be infeasible for very large graphs. 
Furthermore, both algorithms as well as the algorithms from [34] and [20] are inherently 
sequential or otherwise difficult to parallelize. The ease with which an algorithm can be 
parallelized is important for handling large-scale graphs in emerging applications, where 
graphs with millions (or more) vertices are quite common [21]. 

In this paper, we present a new exact branch-and-bound algorithm for the maximum 
clique problem that employs several new pruning strategies in addition to those in [9], 
[27], [34] and [20], making it suitable for massive graphs. We also present a heuristic 
that is based on similar pruning techniques as the exact algorithm but runs much much 
faster — the heuristic follows just one of the "paths" in the search space, and as a result its 
complexity is nearly linear-time in the size of the graph, in contrast to the exact algorithm 
whose worst-case complexity is exponential. Both the exact algorithm and the heuristic 
are well-suited for parallelization. The algorithms are discussed in detail in Section 3. 

In Section 4 we present an extensive experimental analysis comparing the performance 
of our algorithms with the algorithm of Carraghan and Pardalos [9], the algorithm of 
Ostergard [27] and the algorithm of Konc and Janezic [20]. The workings of the latter 
three algorithms is reviewed in Section 2. Our testbed includes large-scale real-world 
graphs drawn from various application domains, large-scale synthetic graphs representing 
various structures, and DIMACS benchmark graphs. The new exact algorithm is found to 
be up to orders of magnitude faster on large, sparse graphs and of comparable runtime on 
denser graphs. The heuristic in turn is found to run several orders of magnitude faster than 
the exact algorithm, while delivering solutions that are optimal or near-optimal for most 
cases. We have made our implementations publicly available at ht tp : / / cuci s . ece . 
northwestern . edu/pro ject s/MAXCLIQUE/. 



2. Related Previous Algorithms 

Given a simple undirected graph G, the maximum clique can clearly be obtained by 
enumerating all of the cliques present in it and picking the largest of them. Carraghan 
and Pardalos [9] introduced a simple-to-implement algorithm that avoids enumerating all 
cliques and instead works with a significantly reduced partial enumeration. The reduction 
in enumeration is achieved via a pruning strategy which reduces the search space tremen- 
dously. The algorithm works by performing at each step i, a depth first search from vertex 
Vi, where the goal is to find the largest clique containing the vertex V{. At each depth of 
the search, the algorithm compares the number of remaining vertices that could potentially 
constitute a clique containing vertex v\ against the size of the largest clique encountered 
thus far. If that number is found to be smaller, the algorithm backtracks (search is pruned). 
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Ostergard [27] devised an algorithm that incorporated an additional pruning strategy to 
the one by Carraghan and Pardalos. The opportunity for the new pruning strategy is cre- 
ated by reversing the order in which the search is done by the Carraghan-Pardalos algo- 
rithm. This allows for an additional pruning with the help of some auxiliary bookkeeping. 
Experimental results in [27] showed that the Ostergard algorithm is faster than the one by 
Carraghan-Pardalos on random and DIMACS benchmark graphs [19]. However, the new 
pruning strategy used in this algorithm is intimately tied to the order in which vertices are 
processed, introducing an inherent sequentiality into the algorithm. 

A number of existing branch-and-bound algorithms for maximum clique use a vertex- 
coloring of the graph to obtain an upper bound on the maximum clique. A vertex-coloring 
of a graph is an assignment of colors to vertices such that a pair of adjacent vertices re- 
ceive different colors. Clearly, the number of colors used gives an upper bound on the 
maximum clique of the graph, which can be used to reduce the search space. A popular 
and recent algorithm based on this idea is the algorithm of Tomita and Seiku [34] (known 
as MCQ). More recently, Konc and Janezic [20] presented an improved version of MCQ, 
known as MaxCliqueDyn (MCQD and MCQD+CS), that involves the use of tighter, com- 
putationally more expensive upper bounds applied on a fraction of the search space. 

3. The New Algorithms 

We describe in this section new algorithms that overcome the aforementioned 
shortcomings — the new algorithms use additional pruning strategies, maintain simplicity, 
and avoid sequential computational order. Before going into the details of the algorithms, 
we introduce a few notations used throughout the paper. We identify the n vertices of the 
input graph G = (V, E) as {vi, V2, . . . , v n }. The set of vertices adjacent to a vertex 
the set of its neighbors, is denoted by N(vi). And the cardinality of N(vi), its degree, is 
denoted by d(vi). 

3. 1 . The Exact Algorithm 

The maximum clique in a graph can be found by computing the largest clique containing 
each vertex and picking the largest among these. A key element of our exact algorithm is 
that during the search for the largest clique containing a given vertex, vertices that cannot 
form cliques larger than the current maximum clique are pruned, in a hierarchical fashion. 
The method is outlined in detail in Algorithm 1. Throughout the algorithm, the variable 
max stores the size of the maximum clique found thus far. Initially it is set to be equal to 
the lower bound lb provided as an input parameter, and it gives the maximum clique size 
when the algorithm terminates. 

To obtain the largest clique containing a vertex Vi, it is sufficient to consider only the 
neighbors of V{. The main routine MaxClique thus generates for each vertex vi G V a 
set U C N(vi) (neighbors of v\ that survive pruning) and calls the subroutine CLIQUE on 
U. The subroutine Clique goes through every relevant clique containing v\ in a recur- 
sive fashion and returns the largest. The subroutine is similar to the Carraghan-Pardalos 
algorithm [9]. We use size to maintain the size of the clique found at any point through 
the recursion. Since we start with a clique of just one vertex, the value of size is set to be 
one initially when the subroutine Clique is called (Line 10 of Algorithm 1). 

Our algorithm consists of several pruning steps. The pruning in Line 4 of Max- 
Clique (Pruning 1) filters vertices having strictly fewer neighbors than the size of the 
maximum clique already computed. These vertices can be safely ignored, since even if a 
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Algorithm 1 Algorithm for finding the maximum clique of a given 
graph. Input: Graph G = (V, E), lower bound on clique lb (default, 0). 
Output. Size of maximum clique. 

l: procedure MaxClique(G = (V, E), lb) 
2: max <— lb 
3: for i : 1 to n do 
4: if d(vi) > max then 

5: U<-9 

6: for each Vj E iV(^) do 
7: if j > 2 then 

8: if d(vj) > max then 

9: U^UU{Vj} 

10: Clique(G, J7, 1) 



- Subroutine 

i: procedure Clique(G = (V, £"), C7, size) 

2: if £7 = then 

3: if size > max then 

4: max ^— size 

5: return 

6: while \U\ > Odo 

7: if Size + | [7 1 < max then > Pruning 4 

8: return 

9: Select any vertex u from [7 

10: [7 <-U\ {u} 

11: AT^T/) 1= {w\w E A^(ix) A d(iy) > max} > Pruning 5 

12: Clique(G, U H AT 7 (-a), size + 1) 



clique were to be found, its size would not be larger than max. While forming the neigh- 
bor list U for a vertex Vi, we include only those of ^'s neighbors for which the largest 
clique containing them has not been found (Line 7, Pruning 2), to avoid recomputing pre- 
viously found cliques. Furthermore, the pruning in Line 8 (Pruning 3) excludes vertices 
Vj E N(vi) that have degree less than the current value of max, since any such vertex 
could not form a clique of size larger than max. The pruning strategy in Line 7 of subrou- 
tine CLIQUE (Pruning 4) checks for the case where even if all vertices of U were added 
to get a clique, its size would not exceed that of the largest clique encountered so far in 
the search, max. The pruning in Line 11 of Clique (Pruning 5) reduces the number of 
comparisons needed to generate the intersection set in Line 12. Pruning 4 is used in most 
existing algorithms, whereas pruning steps 1, 2, 3 and 5 are new. 

3.2. The Heuristic 

The exact algorithm examines for every vertex V{ all relevant cliques containing the vertex 
Vi in order to determine the clique of maximum size among them. Our heuristic speeds up 
this process by instead examining only a subset of the relevant cliques. 

The heuristic is presented in Algorithm 2. The main routine is very similar to the main 
routine in Algorithm 1. The subroutine CliqueHeu considers only the maximum degree 
neighbor at each step instead of recursively considering all neighbors from the set U. 
Since we are looking for the largest clique containing each vertex, the maximum degree 
vertex is more likely to be a member of the largest clique compared to the other vertices. 



> Pruning 1 

> Pruning 2 

> Pruning 3 



Optimization Methods and Software 



5 



Algorithm 2 Heuristic for finding the maximum clique in a graph. 
Input: Graph G = (V, E). Output: Approximate size of maximum 
clique. 

l: procedure MaxCliqueHeu(G = (V,E)) 
2: for i : 1 to n do 
3: if d(vi) > max then 

4: U<-Q 

5: for each Vj e N(vi) do 

6: if d(vj) > max then 

7: U<-U\j{Vj} 
8: CLIQUEHEU(G, U, 1) 



- Subroutine 

i: procedure CliqueHeu(G = (V, £■), C7, size) 

2: if £7 = then 

3: if size > max then 

4: max ^— size 

5: return 

6: Select a vertex ^ e ?7 of maximum degree in G 

7: [7 ^ U \ {u} 

8: iV'(^) := {w\w E A^(ix) A > max} > Pruning 5 

9: CliqueHeu(G, U H AT 7 ^), size + 1) 



The effect of choosing the maximum degree vertex as opposed to any random vertex will 
be analyzed in Section 4.2.2. We note that Turner [36] uses an algorithm similar in spirit 
to the subroutine of Algorithm 2 in his coloring algorithm. 

3.3. Complexity 

The exact algorithm, Algorithm 1, examines for every vertex v\ all candidate cliques con- 
taining the vertex v\ in its search for the largest clique. Its time complexity is exponential 
in the worst case. The heuristic, Algorithm 2, loops over the n vertices, each time possibly 
calling the subroutine CliqueHeu, which effectively is a loop that runs until the set U 
is empty. Clearly, \U\is bounded by the max degree A in the graph. The subroutine also 
includes the computation of a neighbor list, whose runtime is bounded by O(A). Thus, 
the time complexity of the heuristic is bounded by 0(n • A 2 ). 

4. Experiments and Results 

We present in this section experimental results comparing the performance of our algo- 
rithm with the algorithms of Carraghan-Pardalos [9], Ostergard algorithm [27], and Konc 
and Janezik [20]. 

We implemented the algorithm of [9] ourselves, whereas for the algorithm of [27], 
we used the publicly available cliquer source code [26], and similarly, for the algorithm 
of [20] we used the code MaxCliqueDyn (MCQD, available at http : / / www . sicmm . 
org/ ~ konc /maxcli que/). Among the variants available in MCQD, we report results 
on the best-performing variant, the variant called MCQD+CS (that uses improved coloring 
and dynamic sorting). 

All our experiments are performed on a Linux workstation running 64-bit version Red 
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Table 1. Overview of real-world graphs in the testbed and their origins. 

Graph Description 

cond-mat-2003 [25] A collaboration network of scientists posting preprints on 

the condensed matter archive at www.arxiv.org in the period 
between January 1, 1995 and June 30, 2003. 

email-Enron [22] A communication network representing email exchanges. 

Nodes are email addresses and there is a directed edge from 
node i to node j if at least one email is sent from i to j. 

dictionary28 [4] Pajek network of words. 

Fault -639 [1] A structural problem discretizing a faulted gas reservoir with 

tetrahedral Finite Elements and triangular Interface Elements. 
audikwJ [11] An automotive crankshaft model of TETRA elements. 

boneOlO [37] A detailed micro-finite element (micro-FE) model of bones 

representing the porous bone micro-architecture. 
af shell [11] A sheet metal forming simulation network. 

as-Skitter [22] An Internet topology graph from trace routes run daily in 2005. 

roadNet-CA [22] A road network of California. Nodes represent intersections 

and endpoints and edges represent the roads connecting the 

intersections or endpoints. 
kkt -power [11] An Optimal Power Flow (nonlinear optimization) network. 



Hat Enterprise Linux Server release 6.2, with a 2.00 GHz Intel Xeon E7540 processor. 
Our implementations are all in C++, and the codes are compiled using gcc version 4.4.6 
with -03 optimization. 

4.1. Test Graphs 

Our testbed is grouped in three categories. 

4.1.1. Real-world graphs 

Under this category, we consider 10 graphs (downloaded from the University of Florida 
Sparse Matrix Collection [11]) that originate from various real- world applications. Table 1 
gives a quick overview of the graphs and their origins. 

4.1.2. Synthetic Graphs 

In this category we consider 15 graphs generated using the R-MAT algorithm [10]. The 
graphs are subdivided in three categories depending on the structures they represent. 

• Random graphs (5 graphs) - Erdos-Renyi random graphs generated using R-MAT 
with the parameters (0.25, 0.25, 0.25, 0.25). The graphs are denoted with prefix rmat-er. 

• Skewed Degree, Type 1 graphs (5 graphs) - graphs generated using R-MAT with the 
parameters (0.45, 0.15, 0.15, 0.25). Denoted with prefix rmatjsdl. 

• Skewed Degree, Type 2 graphs (5 graphs) - graphs generated using R-MAT with the 
parameters (0.55, 0.15, 0.15, 0.15). Denoted with prefix rmatsdl. 

4.1.3. DIMACS graphs 

This last category consists of 5 graphs selected from the Second DIMACS Implemen- 
tation Challenge [19]. 

The DIMACS graphs are an established benchmark for the maximum clique problem, 
but they are of rather limited size and variation. In contrast, the real-work networks in- 
cluded in category 1 of the testset and the synthetic (RMAT) graphs in category 2 repre- 
sent a wide spectrum of large graphs posing varying degrees of difficulty for testing the 
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Table 2. Structural properties (the number of vertices, |V|; edges, \E\; and the maximum degree, A) of the graphs, G in the testbed: 
DIM ACS Challenge graphs (upper left); UF Collection (lower and middle left); RMAT graphs (right). 
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rmatJsdl-5 
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80 
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roadNet-CA 


1,971,281 


2,766,607 


12 


rmatsd2A 


1,048,576 


8,318,004 


9/153 


kkt -power 


2,063,494 


6,482,320 


95 


rmatsd2J) 


2,097,152 


16,645,183 


14,066 


rmat-erJ 


131,072 


1,048,515 


82 


hamming6-4 


64 


704 


22 


rmat-er_2 


262,144 


2,097,104 


98 


johnson8-4-4 


70 


1,855 


53 


rmat-er3 


524,288 


4,194,254 


94 


kelleri 


171 


9,435 


124 


rmat-erA 


1,048,576 


8,388,540 


97 


c-fat200-5 


200 


8,473 


86 


rmat-er_5 


2,097,152 


16,777,139 


102 


brock2002 


200 


9,876 


114 



algorithms. The rmat_er graphs have normal degree distribution, whereas the rmatsdl 
and rmatjsd2 graphs have skewed degree distributions and contain many dense local sub- 
graphs. The rmatjsdl and rmat_sd2 graphs differ primarily in the magnitude of maximum 
vertex degree they contain; the rmat_sd2 graphs have much higher maximum degree. Ta- 
ble 2 lists basic structural information (the number of vertices, number of edges and the 
maximum degree) about all 30 of the test graphs. 

4.2. Results 

Table 3 shows the size of the maximum clique (u) and the runtimes of our exact algorithm 
and the algorithms of Caraghan and Pardalos [9] (CP), Ostergard [27] (cliquer) and Konc 
and Janezic [20] (MCQD+CS) for all the graphs in the testbed. The last two columns show 
the results of our heuristic — the size of the maximum clique returned and its runtime. 

In Section 4.2.1 and Section 4.2.2, we discuss our observations from this table for the 
exact algorithm and the heuristic, respectively, but before that we briefly comment on our 
experience in using the MaxCliqueDyn code. Unfortunately, the code failed to execute 
most of the large instances in our testbed, including the majority of the RMAT and real- 
world instances, due to memory management issues in the code. The entries in Table 3 
marked with hyphen (-) show instances for which the code was aborted due to excessive 
memory usage. Even for the instances it eventually run successfully, we had to first make 
modifications to the graph reader to make it able to handle graphs with multiple connected 
components. 

4.2.1. Exact algorithms 

As expected, our exact algorithm gave the same size of maximum clique as the other 
three algorithms for all test cases. In terms of runtime, its relative performance compared 
to the other three varied in accordance with the advantages afforded by the various pruning 
steps. 

Analysis of pruning steps. Vertices that are discarded by Pruning 1 are skipped in 
the main loop of the algorithm, and the largest cliques containing them are not computed. 
Pruning 2 avoids re-comuting previously computed cliques in the neighborhood of a ver- 
tex. In the absence of Pruning 1, the number of vertices pruned by Pruning 2 would be 
bounded by the number of edges in the graph (note that this is more than the total number 
of vertices in the graph). While Pruning 3 reduces the size of the input set on which the 
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Table 3. Comparison of runtimes (in seconds) of algorithms [9] (CP), [27] (cliquer) and [20] (MCQD+CS) with the time taken by our new 
exact algorithm (T new - exact ) for the graphs in the testbed, with the fastest (marked in bold) for each case. An asterisk (*) indicates that 
the algorithm did not terminate within 25,000 seconds for that instance. A hyphen (-) indicates the publicly available implementation by 
the authors of algorithm terminated due to the graph being too large for the implementation to handle, u denotes the maximum clique size, 
ujnew- heuristic, the maximum clique size returned by our heuristic and T new -h eur istic, its runtime. For the graph rmatsd2-5, none 
of the algorithms computed the maximum clique size in a reasonable time; the entry is marked N, denoting "Not Known"). 













7~neu> — 


^new- 


'Tnew- 


Cjrapn 


cu 




^cliquer 


T MCQD+CS 


exact 


heuristic 


heuristic 


rnwrl-Tvint-JOf) ? 
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11.17 


2.41 


0.011 


25 


<0.01 


email-Enron 


20 


7.005 


15.08 


3.70 


0.998 


18 


0.261 


dictionary28 


26 


7.700 


32.74 


7.69 


<0.01 


26 


<0.01 


Fault.639 


18 


14571.20 


4437.14 




20.03 


18 


5.80 


audikwJ 


36 




9282.49 


_ 


190.17 


36 


58.38 


boneOlO 


24 




10002.67 


- 
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24 


24.39 


af shell 10 


15 




21669.96 


- 


50.99 


15 


10.67 


as-Skitter 


67 


24385.73 




- 


3838.36 


66 


27.08 


roadNet-CA 


4 






- 


0.44 


4 


0.08 


kkt-power 


11 






- 


2.26 


11 


1.83 
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3 


256.37 


215.18 


49.79 


0.38 


3 


0.12 


rmat_er^2 


3 


1016.70 


865.18 


- 


0.78 


3 


0.24 
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3 


4117.35 


3456.39 


- 


1.87 


3 


0.49 
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- 


4.16 
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1.44 
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14 


0.19 


<0.01 
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0.23 


14 


<0.01 
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11 


22.19 


0.15 


0.02 


23.35 


11 


<0.01 
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58 


0.60 


0.33 


0.01 


0.93 


58 


0.04 
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12 


0.98 


0.02 


<0.01 


1.10 


10 


<0.01 



maximum clique is to be computed, Pruning 5 brings down the time taken to generate 
the intersection set in Line 12 of the subroutine. Pruning 4 corresponds to back tracking. 
Unlike Pruning steps 1, 2, 3 and 5, Pruning 4 is used by all three of the other algorithms 
in our comparison. 

One of the strengths of our algorithm is its ability to take advantage of pruning in 
multiple steps in a hierarchical fashion, allowing for opportunities for one or more of the 
steps to kick in and impact performance. In Figure 1 we show the number of vertices 
discarded by all the pruning steps of the exact algorithm normalized by the total number 
of edges in a graph for the real-world graphs (category 1) in the testbed. We cut few 
bars reachining 140% as their correspnding values are much higher. It can be seen for 
these graphs pruning steps 2 and 5 in particular discard a large percentage of vertices, 
potentially resulting in large runtime savings. The general behavior of the pruning steps 
Pruning 1, 2, 3 and 5 for the synthetic graphs rmat_er and rmatjsdl was observed to be 
somewhat similar to that depicted in Figure 1 for the real- world graphs. In contrast, for the 
DIAMCS graphs, the number of vertices pruned in steps Pruning 1, 3 and 5 were observed 
to be zero; the numbers in the step Pruning 2 were nonzero, but relatively modest. In the 
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Figure 1 . Number of "pruned" vertices in the various pruning steps normalized by the number of edges in the graph (in 
percents) for the test graphs in category 1 (we cut few bars reachining 140% as their correspnding values are much higher). 

Appendix, we provide a complete tabulation of the raw numbers for the pruned vertices 
in all the steps for all the graphs in the testbed. 

As a result of the differences seen in the effects of the pruning steps, as discussed below, 
the runtime performance of our algorithm (seen in Table 3) compared to the other three 
algorithms varied in accordance with the difference in the structures represented by the 
different categories of graphs in the testbed. 

Real-world Graphs. For most of the graphs in this category, it can be seen that our 
algorithm runs several orders of magnitude faster than the other three, mainly due to the 
large amount of pruning the algorithm enforced. For the graphs Fault -639, audikwJ and 
afshelllO, Prunings 1, 3 and 5 had relatively small impact, whereas, Pruning 2 makes 
a huge impact. The number of vertices pruned in steps Pruning 1 and 3 varied among 
the graph within the category, ranging from 0.001% for aj shell to a staggering 97% for 
as -Skitter for the step Pruning 1 (see the table in the Appendix for details). 

Synthetic Graphs. For the synthetic graph types rmat_er and rmatsdl, our algorithm 
clearly outperforms the other three by a few orders of magnitude in all cases. This is 
also primary due to the high number of vertices discarded by the new pruning steps. In 
particular, for rmatjsdl graphs, between 30 to 37% of the vertices are pruned just in the 
step Pruning 1. For the rmatsdl graphs, which have relatively larger maximum clique 
and higher maximum degree than the rmatjsdl graphs, our algorithm is observed to be 
faster than CP but slower than cliquer. 

DIMACS Graphs. The runtime of our exact algorithm for the DIMACS graphs is in 
most cases comparable to that of CP and higher than that of cliquer and MCQD+CS. For 
these graphs, only Pruning 2 was found to be effective (see the table in the Appendix for 
details), and thus the performance results agree with one's expectation. We include in the 
Appendix timing results on a larger collection of DIMACS graphs. 

It is to be noted that the DIMACS graphs are intended to serve as challenging test 
cases for the maximum clique problem, and graphs with such high edge densities and 
low vertex count are rather rare in practice. For example, most of them have between 20 
to 1024 vertices with an average edge density of roughly 0.6. However, most real world 
graphs are often very large and sparse. Good examples are Internet topology graphs [14], 
the web graph [21], social network graphs [12], and the real- world graphs in our testbed. 

4.2.2. The Heuristic 

It can be seen that our heuristic runs several orders of magnitude faster than our exact 
algorithm, while delivering either optimal or very close to optimal solution. It gave the 
optimal solution on 25 out of the 30 test cases. On the remaining 5 cases where it was 
suboptimal, it's accuracy ranges from 83% to 99% (on average 93%). Additionally, we 
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Figure 2. Runtime (normalized, mean) comparison between various algorithms. For each category of graph, first, all 
runtimes for each graph were normalized by the runtime of the slowest algorithm for that graph, and then the mean was 
calculated for each algorithm. In the various bars, graphs were considered only if the runtimes for at least three algorithms 
was less than the 25,000 seconds limit set. 
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Figure 3. Run time plots of the new exact and heuristic algorithms. The third curve, labeled edges, shows the quantity 
number of edges in the graph divided by the clock frequency of the computing platform used in the experiment. 



run the heuristic by choosing a vertex randomly in Line 6 of Algorithm 2 instead of the 
one with the maximum degree. We observe that on average, the solution is optimal only 
for less than 40% of the test cases compared to 83% when selecting the maximum degree 
vertex. 

Figure 2 provides an aggregated visual summary of the runtime trends of the various 
algorithms across the five categories of graphs in the testbed. 

To give a sense of runtime growth rates, we provide in Figure 3 plots of the runtime of 
the new exact algorithm and the heuristic for the synthetic and real-world graphs in the 
testbed. Besides the curves corresponding to the runtimes of the exact algorithm and the 
heuristic, the figures also include a curve corresponding to the number of edges in the 
graph divided by the clock frequency of the computing platform used in the experiment. 
This curve is added to facilitate comparison between the growth rate of the algorithms with 
that of a linear-time (in the size of the graph) growth rate. It can be seen that the runtime 
of the heuristic by and large grows somewhat linearly with the size of a graph. The exact 
algorithm's runtime, which is orders of magnitude larger than the heuristic, exhibited a 
similar growth behavior for these test-cases (although its worst-case complexity suggests 
exponential growth). 
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4.3. Example of an application in social network analysis 

We conclude this section on experiments with a small example demonstrating the appli- 
cation of the clique algorithms for detecting overlapping communities in social networks. 
In many real networks vertices may belong to more than one group, and such groups form 
overlapping communities. Classical examples are social networks, where an individual 
usually belongs to different circles at the same time, from that of work colleagues to fam- 
ily, sport associations, etc. Finding overlapping communities is a challenging problem 
[15]. Clique algorithms are one way in which a solution can be found. 
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Figure 4. Some Facebook communities detected by our max clique heuristic. 

For our small experiment, we use data collected from Facebook 1 . Every user on Face- 
book has a wall, which is a the user's profile space that allows the posting of messages, 
often short or temporal notes by other users. The user comments and user information 
from specific walls are publicly available and we collected them using Facebook API. We 
constructed a graph with the walls as vertices. Any two users who have commented on the 
same wall indicate a connection between the walls, and we form an edge between them. 
There could be many common users for each wall, and so we assigned edge weights by 
Jacard index or similarity coefficient [23]. Once this is done for all walls, we retained only 
those edges which have weights above a chosen threshold, indicating a strong correlation. 
The threshold is a user's choice and decides both the size and the number of communities 
found. 

We modified our heuristic to retain the largest maximum clique containing each node. 
The exact algorithm could have also been used instead of the heuristic for this purpose. We 
choose the heuristic since it is much faster and for this particular problem of community 
detection the accuracy of the size of cliques formed is not critical. 



http://www.facebook.com 
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Figure 4 shows some of the cliques/communities detected. We see two isolated com- 
munities, one for popular singers, and another for retail chains and products. We also see 
a community for news channels and politics, and a community of MSNBC and popular 
TV shows. The highlight of this experiment is that the clique algorithm allows a node to 
be a member of more than one community giving an overlapping community structure. 
Although the news channels and politics and MSNBC and tv shows communities are not 
directly related and have different members, they share a common member. 

5. Conclusion 

We presented a new exact and a new heuristic algorithm for the maximum clique problem. 
We performed extensive experiments on three broad categories of graphs comparing the 
performance of our algorithms to the algorithms due to Carraghan and Pardalos (CP) [9], 
Ostergard (cliquer) [27] and Konc and Janezic (MCQD+CS) [20]. For DIMACS bench- 
mark graphs and certain dense synthetic graphs (rmatjsd2), our new exact algorithm per- 
forms comparably with the CP algorithm, but slower than cliquer and MCQD+CS. For 
large sparse graphs, both synthetic and real- world, our new algorithm runs several orders 
of magnitude faster than the other three. The heuristic, which runs many orders of magni- 
tude faster than our exact algorithm and the others, gave optimal solution for 83% of the 
test cases, and when it is sub-optimal, its accuracy ranged between 0.83 and 0.99. 

The exact algorithm was in general found to be less successful on relatively dense 
graphs. An interesting line of investigation would be to study ways to overcome this. 
Another line for future work would be to characterize the class(es) of graphs for which 
the heuristic is expected to return near-optimal solution. 
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Appendix 



Table 1. PI, P2, P3, P4 and P5 are the number of vertices pruned in steps Pruning 1, 2, 3, 4, and 5 of Algorithm 1. An asterisk (*) 
indicates that the algorithm did not terminate within 25,000 seconds for that instance, uj denotes the maximum clique size. 
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Table 2. Comparison of runtimes of algorithms [9] (CP), [27] (cliquer) and [20] (MCQD+CS) with that of our new exact 
algorithm (T new -exact) for DIMACS graphs. An asterisk (*) indicates that the algorithm did not terminate within 10,000 
seconds for that instance, uj denotes the maximum clique size, oc > new -heuristic the maximum clique size found by our 
heuristic and r new - heuristic , its runtime. 
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